Skip to content

fix: resolve FixedWindowCallRatePolicy deadlock on 429 without ratelimit-reset header#985

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1775687146-fix-fixed-window-deadlock
Draft

fix: resolve FixedWindowCallRatePolicy deadlock on 429 without ratelimit-reset header#985
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1775687146-fix-fixed-window-deadlock

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Fixes a deadlock in FixedWindowCallRatePolicy that caused connectors to hang for ~10 days (until the heartbeat monitor killed them) when a 429 response arrived from an API that lacks a ratelimit-reset header.

The deadlock was caused by four interacting bugs:

  1. next_reset_ts initialized 10 days in the future (model_to_component_factory.py): Changed to now + period instead of now + 10 days.
  2. retry-after header ignored (call_rate.py): get_reset_ts_from_response() now falls back to the standard retry-after header (interpreted as seconds) when ratelimit-reset is absent.
  3. _update_current_window() single-step advance (call_rate.py): Changed ifwhile so the window catches up past all elapsed periods, not just one.
  4. No cap on sleep duration (call_rate.py): _do_acquire() now caps sleep to 600 seconds with a warning log.

Resolves https://github.com/airbytehq/oncall/issues/11924:

Review & Testing Checklist for Human

  • while loop safety in _update_current_window(): Verify that _offset (the period) can never be zero or negative, which would cause an infinite loop. It comes from parse_duration(model.period) — confirm this always returns a positive timedelta.
  • retry-after header parsing: The implementation only handles integer-seconds format, not the HTTP-date format allowed by RFC 7231. Verify this is acceptable for the APIs Airbyte connectors interact with, or if HTTP-date parsing should be added.
  • 600-second sleep cap: This is hardcoded and not configurable. Confirm this is a reasonable default — some APIs might legitimately request longer waits, though 10 minutes seems safe as a ceiling.

Suggested test plan: Run a connector that uses FixedWindowCallRatePolicy against an API that returns 429 without ratelimit-reset headers (or mock one) and verify the connector retries within seconds rather than hanging indefinitely.

Notes

  • This is a CDK-level fix, not a connector-level change. No connector version bumps or metadata updates are needed.
  • Not a breaking change — only internal behavior is modified (bug fixes with no public API changes).

Link to Devin session: https://app.devin.ai/sessions/6e4ee57d58264eca9e7f33e431983f9c

…mit-reset header

Four interacting bugs caused connectors using FixedWindowCallRatePolicy to
deadlock when a 429 response arrived from an API that lacks a ratelimit-reset
header:

1. next_reset_ts initialized 10 days in the future instead of now + period
2. get_reset_ts_from_response() never fell back to retry-after header
3. _update_current_window() only advanced by one period instead of catching up
4. _do_acquire() had no upper bound on sleep duration

Fixes:
- Initialize next_reset_ts to now + period in model_to_component_factory.py
- Fall back to retry-after header in get_reset_ts_from_response()
- Use while loop in _update_current_window() to advance past all elapsed periods
- Cap maximum sleep in _do_acquire() to 600 seconds with a warning log

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775687146-fix-fixed-window-deadlock#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775687146-fix-fixed-window-deadlock

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

PyTest Results (Fast)

4 019 tests  +7   4 008 ✅ +7   7m 34s ⏱️ -9s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c1722c3. ± Comparison against base commit 4aaafcf.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

PyTest Results (Full)

4 022 tests  +7   4 010 ✅ +7   11m 17s ⏱️ +18s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c1722c3. ± Comparison against base commit 4aaafcf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants